24 research outputs found
Memory Management Support for Multi-Programmed Remote Direct Memory Access (RDMA) Systems
Current operating systems offer basic support for network interface controllers (NICs) supporting remote direct memory access (RDMA). Such support typically consists of a device driver responsible for configuring communication channels between the device and user-level processes but not involved in data transfer. Unlike standard NICs, RDMA-capable devices incorporate significant memory resources for address translation purposes. In a multi-programmed operating system (OS) environment, these memory resources must be efficiently shareable by multiple processes. For such sharing to occur in a fair manner, the OS and the device must cooperate to arbitrate access to NIC memory, similar to the way CPUs and OSes cooperate to arbitrate access to translation lookaside buffers (TLBs) or physical memory. A problem with this approach is that today’s RDMA NICs are not integrated into the functions provided by OS memory management systems. As a result, RDMA NIC hardware resources are often monopolized by a single application. In this paper, I propose two practical mechanisms to address this problem: (a) Use of RDMA only in kernel-resident I/O subsystems, transparent to user-level software; (b) An extended registration API and a kernel upcall mechanism delivering NIC TLB entry replacement notifications to user-level libraries. Both options are designed to re-instate the multiprogramming principles that are violated in early commercial RDMA systems
Recommended from our members
Making the Most out of Direct-Access Network Attached Storage
The performance of high-speed network-attached storage applications is often limited by end-system overhead,
caused primarily by memory copying and network protocol processing. In this paper, we examine alternative strategies for reducing overhead in such systems.
We consider optimizations to remote procedure call (RPC)-based data transfer using either remote direct memory access (RDMA) or network interface support
for pre-posting of application receive buffers. We demonstrate that both mechanisms enable file access throughput that saturates a 2Gb/s network link when
performing large I/Os on relatively slow, commodity PCs. However, for multi-client workloads dominated by small I/Os, throughput is limited by the per-I/O overhead of processing RPCs in the server. For such workloads, we propose the use of a new network I/O mechanism, Optimistic RDMA (ORDMA). ORDMA is an alternative to RPC that aims to improve server throughput and response time for small I/Os. We measured performance improvements of up to 32% in server throughput and 36% in response time with use of ORDMA in our prototype.Engineering and Applied Science
CumuloNimbo: A highly-scalable transaction processing platform as a service.
One of the main challenges facing next generation Cloud platform services is the need to simultaneously achieve ease of programming, consistency, and high scalability. Big Data applications have so far focused on batch processing. The next step for Big Data is to move to the online world. This shift will raise the requirements for transactional guarantees. CumuloNimbo is a new EC-funded project led by Universidad Politécnica de Madrid (UPM) that addresses these issues via a highly scalable multi-tier transactional platform as a service (PaaS) that bridges the gap between OLTP and Big Data applications
Scalable Storage Support for Data Stream Processing
Abstract — Continuous data stream processing systems have offered limited support for data persistence in the past, for three main reasons: First, online, real-time queries examine current streaming data and (under the assumption of no server failures) do not require access to past data; second, stable storage devices are commonly thought to be constraining system throughput and response times when compared to main memory, and are thus kept off the common path; finally, the use of scalable storage solutions which would be required to sustain high data streaming rates have not been thoroughly investigated in the past. Our work advances the state of the art by providing data streaming systems with a scalable path to persistent storage. This path has low impact in the performance properties of a scalable streaming system and allows two fundamental enhancements to their capabilities: First, it allows stream persistence for reference/archival purposes (in other words, queries can now be applied on past data on-demand); second, fault tolerance is achievable by checkpointing and stream replay schemes that are not constrained by the size of main memory
CassMail: A Scalable, Highly-Available, and Rapidly-Prototyped E-Mail Service
International audienceIn this paper we present the design and implementation of a scalable e-mail service over the Cassandra eventually-consistent storage system. Our system provides a working implementation of the SMTP and POP3 protocols and our evaluation shows that the system exhibits scalable performance, high availability, and is easily manageable under write-intensive e-mail workloads. The design and implementation of our system is centered around a synthesis of interoperable components for rapid prototyping and deployment. Besides offering a proof of concept of such an approach to prototyping distributed applications, we further make two key contributions in this paper: First, we provide a detailed evaluation of the configuration and tuning of the underlying storage engine necessary to achieve scalable application performance. Second, we show that the availability of scalable storage systems such as Cassandra simplifies the design and implementation of higher-level scalable services, especially when compared to the effort expended in projects with similar goals in the past (e.g., Porcupine). We believe that the existence of infrastructural services such as Cassandra brings us closer to the vision of a universal toolbox for rapidly prototyping arbitrary scalable services
Distributed Applications and Interoperable Systems: 14th IFIP WG 6.1 International Conference, DAIS 2014, Held as Part of the 9th International Federated Conference on Distributed Computing Techniques, DisCoTec 2014, Berlin, Germany, June 3-5, 2014, Proceedings
International audienceBook Front Matter of LNCS 846
CassMail: A Scalable, Highly-Available, and Rapidly-Prototyped E-Mail Service
Abstract. In this paper we present the design and implementation of a scalable e-mail service over the Cassandra eventually-consistent storage system. Our system provides a working implementation of the SMTP and POP3 protocols and our evaluation shows that the system exhibits scalable performance, high availability, and is easily manageable under write-intensive e-mail workloads. The design and implementation of our system is centered around a synthesis of interoperable components for rapid prototyping and deployment. Besides offering a proof of concept of such an approach to prototyping distributed applications, we further make two key contributions in this paper: First, we provide a detailed evaluation of the configuration and tuning of the underlying storage engine necessary to achieve scalable application performance. Second, we show that the availability of scalable storage systems such as Cassandra simplifies the design and implementation of higher-level scalable services, especially when compared to the effort expended in projects with similar goals in the past (e.g., Porcupine). We believe that the existence of infrastructural services such as Cassandra brings us closer to the vision of a universal toolbox for rapidly prototyping arbitrary scalable services